Search CORE

63 research outputs found

A text-to-picture system for russian language

Author: Ustalov Dmitry
Publication venue
Publication date: 01/01/2012
Field of study

This paper presents motivation and design of the general purpose text-to-picture synthesis system. The described TTP system is designed for Russian language processing and operates with the natural language analysis subsystem, the stage processing subsystem, and the rendering subsystem. Every processing stage has been described and the basic design ideas of the system architecture have been highlighted. User study has been performed and further work reasons are explained

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Best Prompts for Text-to-Image Models and How to Find Them

Author: Pavlichenko Nikita
Ustalov Dmitry
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2023
Field of study

Recent progress in generative models, especially in text-guided diffusion models, has enabled the production of aesthetically-pleasing imagery resembling the works of professional human artists. However, one has to carefully compose the textual description, called the prompt, and augment it with a set of clarifying keywords. Since aesthetics are challenging to evaluate computationally, human feedback is needed to determine the optimal prompt formulation and keyword combination. In this paper, we present a human-in-the-loop approach to learning the most useful combination of prompt keywords using a genetic algorithm. We also show how such an approach can improve the aesthetic appeal of images depicting the same descriptions.Comment: 13 pages (6 main pages), 7 figures, 4 tables, accepted at SIGIR '23 Short Paper Trac

arXiv.org e-Print Archive

Unsupervised Sense-Aware Hypernymy Extraction

Author: Biemann Chris
Panchenko Alexander
Ponzetto Simone Paolo
Ustalov Dmitry
Publication venue
Publication date: 17/09/2018
Field of study

In this paper, we show how unsupervised sense representations can be used to improve hypernymy extraction. We present a method for extracting disambiguated hypernymy relationships that propagates hypernyms to sets of synonyms (synsets), constructs embeddings for these sets, and establishes sense-aware relationships between matching synsets. Evaluation on two gold standard datasets for English and Russian shows that the method successfully recognizes hypernymy relationships that cannot be found with standard Hearst patterns and Wiktionary datasets for the respective languages.Comment: In Proceedings of the 14th Conference on Natural Language Processing (KONVENS 2018). Vienna, Austri

arXiv.org e-Print Archive

Eliminating fuzzy duplicates in crowdsourced lexical resources

Author: Kiselev Yuri
Porshnev Sergey
Ustalov Dmitry
Publication venue: Global WordNet Association
Publication date: 01/01/2016
Field of study

Collaboratively created lexical resources is a trending approach to creating high quality thesauri in a short time span at a remarkably low price. The key idea is to invite non-expert participants to express and share their knowledge with the aim of constructing a resource. However, this approach tends to be noisy and error-prone, thus making data cleansing a highly topical task to perform. In this paper, we study different techniques for synset deduplication including machine- and crowd-based ones. Eventually, we put forward an approach that can solve the deduplication problem fully automatically, with the quality comparable to the expert-based approach

MAnnheim DOCument Server

Watset : automatic induction of synsets from a graph of synonyms

Author: Biemann Chris
Panchenko Alexander
Ustalov Dmitry
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

This paper presents a new graph-based approach that induces synsets using synonymy dictionaries and word embeddings. First, we build a weighted graph of synonyms extracted from commonly available resources, such as Wiktionary. Second, we apply word sense induction to deal with ambiguous words. Finally, we cluster the disambiguated version of the ambiguous input graph into synsets. Our meta-clustering approach lets us use an efficient hard clustering algorithm to perform a fuzzy clustering of the graph. Despite its simplicity, our approach shows excellent results, outperforming five competitive state-of-the-art methods in terms of F-score on three gold standard datasets for English and Russian derived from large-scale manually constructed lexical resources

arXiv.org e-Print Archive

Crossref

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

MAnnheim DOCument Server

A spinning wheel for YARN : user interface for a crowdsourced thesaurus

Author: Braslavski Pavel
Mukhin Mikhail
Ustalov Dmitry
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

YARN (Yet Another RussNet) project started in 2013 aims at creating a large open thesaurus for Russian using crowdsourcing. This paper describes synset assembly interface developed within the project — motivation behind it, design, usage scenarios, implementation details, and first experimental results

Crossref

MAnnheim DOCument Server

Graph clustering for natural language processing

Author: Ustalov Dmitry
Publication venue
Publication date: 01/01/2018
Field of study

Graph-based representations are proven to be an effective approach for a variety of Natural Language Processing (NLP) tasks. Graph clustering makes it possible to extract useful knowledge by exploiting the implicit structure of the data. In this tutorial, we will present several efficient graph clustering algorithms, show their strengths and weaknesses as well as their implementations and applications. Then, the evaluation methodology in unsupervised NLP tasks will be discussed

ZENODO

MAnnheim DOCument Server

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Clustering Without Knowing How To: Application and Evaluation

Author: Fedulov Daniil
Likhobaba Daniil
Ustalov Dmitry
Publication venue
Publication date: 29/03/2023
Field of study

Crowdsourcing allows running simple human intelligence tasks on a large crowd of workers, enabling solving problems for which it is difficult to formulate an algorithm or train a machine learning model in reasonable time. One of such problems is data clustering by an under-specified criterion that is simple for humans, but difficult for machines. In this demonstration paper, we build a crowdsourced system for image clustering and release its code under a free license at https://github.com/Toloka/crowdclustering. Our experiments on two different image datasets, dresses from Zalando's FEIDEGGER and shoes from the Toloka Shoes Dataset, confirm that one can yield meaningful clusters with no machine learning algorithms purely with crowdsourcing.Comment: accepted at ECIR 2023 Demonstration Trac

arXiv.org e-Print Archive